Publishing eDNA data through GBIF-US & OBIS-USA

Stephen Formel

Biogeographic Science Branch

Science Analytics and Synthesis (SAS)

Example DNA-Derived Datasets

Title Platform Link
eDNA from Gulf of Mexico Ecosystems and Carbon Cruise 2021 (GOMECC-4) OBIS https://obis.org/dataset/210efc7c-4762-47ee-b4b5-22a0f436ef44
GBIF https://doi.org/10.15468/sm6fpz
18S Monterey Bay Time Series: an eDNA data set from Monterey Bay, California, including years 2006, 2013 - 2016 OBIS https://obis.org/dataset/62b97724-da17-4ca7-9b26-b2a22aeaab51
GBIF https://doi.org/10.15468/84ntea
COI data from: Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal (Shea & Boehm, 2024) OBIS https://obis.org/dataset/54bc0e9c-e857-4216-a6ce-46cd6ae58cd7
GBIF https://doi.org/10.15468/33artc
eDNA observations, concurrent with trawl survey, of marine fish in coastal New Jersey, USA 2019 OBIS https://obis.org/dataset/fe2ed263-2b21-47d7-a79f-f9b911132398
GBIF https://doi.org/10.15468/zsrtyb
Kenai National Wildlife Refuge Aquatic Invasive Fish Surveys GBIF https://doi.org/10.15468/kzw2j5

What are GBIF and OBIS?



GBIF and OBIS are international initiatives that aim to provide open access to biodiversity occurrence data.


Although they are most well-known for their publishing platforms, and their role as data aggregators…


…both initiatives represent an investment from the international political community and a vibrant scientific community of nodes, publishers, and users of standards and practices.

What are GBIF and OBIS?

GBIF Logo Occurrence Records: 2,689,019,590

Datasets: 104,088

Publishers: 2,720

Papers: 10,397

OBIS Logo

Occurrence Records: 126,818,624

Datasets: 4,852

Taxa: 149,319

What are GBIF and OBIS?

Map of the GBIF and OBIS networks. Countries where there is at least one GBIF node are in blue. Yellow points represent OBIS nodes.

Map of the GBIF and OBIS networks.  Countries where there is at least one GBIF node are in blue.  Yellow points represent OBIS nodes.

What are GBIF-US and OBIS-USA?

USGS-SAS Biogeographic Science Branch manages the US nodes: GBIF-US and OBIS-USA.

  • represent US interests in the international initiatives
  • advise the US scientific community on biodiversity informatics, data, and standards
  • mobilize and synthesize data to enhance its FAIRness and relevance on national and international scales.

We also manage the OBIS-USA and GBIF-US Integrated Publishing Toolkits (IPTs)!

Why do we work with GBIF and OBIS?

https://ark.digitalcommonwealth.org/ark:/50959/r781wg156

Sparkly fountains need robust plumbing systems.
- Terry McConnell (International Digital Twins of the Ocean Summit 2022)

The US has been a voting participant in GBIF since 2001 and became an OBIS-USA node in 2005.


The USGS is shepherding over two decades of investment in the robust plumbing needed for national and global biodiversity science and data.

How are GBIF and OBIS the same?

Both are active and energetic communities that develop, maintain, and promote standards and practices for biodiversity science and have been doing so for > 20 years

Both promote open data and science, and publish massive amounts of standardized biological observations through their platforms

Have signed a letter of agreement in recognition that their respective communities will benefit from more streamlined ways of working together. A stronger action plan is forthcoming.

GBIF and OBIS are built on the same data and metadata standards, and share open-source tools

Important Differences

GBIF has a global scope. OBIS includes marine occurrences only.

GBIF is funded by individual participant countries. We fund GBIF-US through the NSF. OBIS is funded by UNESCO, an agency of the United Nations.

GBIF-US and OBIS-USA are unusual in that both nodes are managed by the same people.

How does it work?

This is an illustration of the GBIF workflow. The OBIS workflow is similar.

What do GBIF and OBIS do well?

  • Coordinate and serve interoperable observations of biological occurrence
  • Promote and catalyze progress for open science and FAIR data
  • Searching data at the occurrence level
  • Track data use and citations
  • Well designed tools and guidance exist for self-publishing
  • Active and diverse community

So, why use GBIF and OBIS?

Off-the-shelf platforms for publishing and tracking use of biological occurrence data with minimal financial or staffing investment. USGS, and SpringerNature, accept GBIF as an archival repository, publishing to OBIS-USA satisfies NOAA archiving requirements.

Sustainablity - International

Both organizations rely on regular international investment for funding.

OBIS recently transitioned from a project to a program, which means it now receives core UNESCO/IOC funding and staff support to enable the activity to operate on a permanent basis.

Open standards and science means that migration of data would be relatively simple in the event of a surprise shutdown.

Sustainablity - US

GBIF-US is funded by NSF (~ $600k / year). There is a US Delegation that guides GBIF for US interests.

OBIS-USA is funded through US participation in UN and UNESCO.

For operation of both nodes, USGS funds the node staff salary + travel. NOAA IOOS also contributes staff.

Both IPTs could easily transition to other management and continue to operate without US funding

Both secretariats have funding and training in place for new nodes and node managers

Things in Development

  • Taxonomic alignment

  • Facilitating round tripping of data, taxonomic updates

  • Sequence searches

  • eDNA publishing tools for non-technical / small-to-medium publishers

  • Expansion of US node staff

Publishing Data

flowchart TB
   
    collect("Collect Data")
    align("Align to Darwin Core")
    create("Create a Darwin Core Archive (DwC-A)")
    register("Register DwC-A with OBIS/GBIF")
    
    collect-->align-->create-->register

    classDef default fill:#F9E9B5,stroke-width:0px,font-size:48pt;
  

DNA-Derived Data

https://docs.gbif.org/publishing-dna-derived-data/img/web/sampling-processes.en.svg

Two Standards

  1. Darwin Core (DwC): Describe your biological observation.
  2. Ecological Metadata Language (EML): Describe your dataset.

The Darwin Core Archive (DwC-A)

The DNADerivedData extension

The DNADerivedData extension extends occurrence data with molecular biology metadata.

It mostly consists of terms from:

  • GGBN Data Standard
  • GSC MIxS (Minimum Information about any (X) Sequence)
  • MIQE (The Minimum Information for Publication of Quantitative Real-Time PCR Experiments).

Guidance is available: Publishing DNA-derived data through biodiversity data platforms. The guide handles barcoding, metabarcoding, metagenomics and qPCR/ddPCR.

Examples from Silliman et al., 2023

GBIF Landing Page

OBIS Landing Page

Examples of DwC Terms

Table 1: Example Occurrence Data From Silliman et al., 2023
Term Example
occurrenceID GOMECC4_PANAMACITY_Sta21_DCM_A_occ7c8b2f5e16137114160dfd4001f67550
eventDate 2021-09-20T18:04-04:00
locality USA: Gulf of Mexico
locationID PANAMACITY_Sta21
decimalLatitude 29.206
decimalLongitude -85.647
geodeticDatum WGS84
minimumDepthInMeters 39
maximumDepthInMeters 39

Examples of DwC Terms

Table 2: Example Occurrence Data From Silliman et al., 2023
Term Example
occurrenceID GOMECC4_PANAMACITY_Sta21_DCM_A_occ7c8b2f5e16137114160dfd4001f67550
basisOfRecord MaterialSample
organismQuantity 35
organismQuantityType DNA sequence reads
sampleSizeValue 12436
sampleSizeUnit DNA sequence reads
associatedSequences https://www.ncbi.nlm.nih.gov/sra/SRR26161072 | https://www.ncbi.nlm.nih.gov/biosample/SAMN37516159 | https://www.ncbi.nlm.nih.gov/bioproject/PRJNA887898
identificationRemarks Tourmaline; qiime2-2021.2; naive-bayes classifier, confidence (at lowest specified taxon): 0.966508279, against reference database: PR2 v5.0.1; V9 1391f-1510r region; 10.5281/zenodo.8392706. The PR2 database used for taxonomic assignment is primarily curated for protists, and may not accurately resolve metazoa, land plants or macrosporic fungi to lower taxonomic levels.
verbatimIdentification Karenia brevis
scientificName Karenia brevis
scientificNameID urn:lsid:marinespecies.org:taxname:233015

Examples of DwC Terms

Table 3: Example Occurrence Data (ambiguous taxonomy) From Silliman et al., 2023
Term Example
occurrenceID GOMECC4_YUCATAN_Sta100_Surface_B_occ1f111363da96fee3d180ddb12741d4ce
basisOfRecord MaterialSample
organismQuantity 18
organismQuantityType DNA sequence reads
sampleSizeValue 12151
sampleSizeUnit DNA sequence reads
associatedSequences https://www.ncbi.nlm.nih.gov/sra/SRR26160967 | https://www.ncbi.nlm.nih.gov/biosample/SAMN37516435 | https://www.ncbi.nlm.nih.gov/bioproject/PRJNA887898
identificationRemarks Tourmaline; qiime2-2021.2; naive-bayes classifier, confidence (at lowest specified taxon): 0.960018364, against reference database: PR2 v5.0.1; V9 1391f-1510r region; 10.5281/zenodo.8392706. The PR2 database used for taxonomic assignment is primarily curated for protists, and may not accurately resolve metazoa, land plants or macrosporic fungi to lower taxonomic levels.
verbatimIdentification Unassigned
scientificName Biota incertae sedis
scientificNameID urn:lsid:marinespecies.org:taxname:12

Examples of DwC Terms

Table 4: Example Occurrence Data From Silliman et al., 2023
Term Example
occurrenceID GOMECC4_PANAMACITY_Sta21_DCM_A_occ7c8b2f5e16137114160dfd4001f67550
DNA_sequence GCTCCTACCGATTGAGTGATCCGGTGAATAATTCGGACTGCCGCAGTGTTCAGATCCTGAACGTTGCAGTGGAAAGTTTAGTGAACCTTATCACTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC
concentration 1.177
target_gene 18S rRNA
target_subfragment V9
pcr_primer_forward GTACACACCGCCCGTC
pcr_primer_reverse TGATCCTTCTGCAGGTTCACCTAC
pcr_primer_name_forward 1391f
pcr_primer_name_reverse EukBr
pcr_primer_reference 10.1371/journal.pone.0006372
seq_meth Illumina MiSeq 2x250
otu_class_appr Tourmaline; qiime2-2021.2; dada2; ASV
otu_db PR2 v5.0.1; V9 1391f-1510r region; 10.5281/zenodo.8392706
otu_seq_comp_appr Tourmaline; qiime2-2021.2; naive-bayes classifier

You don’t have to use OBIS/GBIF.

So, why use GBIF and OBIS?

Off-the-shelf platforms for publishing and tracking use of biological occurrence data with minimal financial or staffing investment.

Thank you!

Compliments? Complaints? Ideas?

sformel@usgs.gov